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Summary 


Objectives 

The  overall  goal  of  the  project  was  to  develop  new  foundations  and  architectures  for  secure  and  reliable 
wireless  sensor  network  (WSN)  applications.  Main  subgoals  included  the  following: 

1.  To  develop  a  new  middleware  system  supporting  secure  RPC  communications  in  WSNs. 

2.  To  develop  new  foundations  and  practical  language  architectures  for  type-safe  program  specializa¬ 
tion,  aka  staged  programming,  in  WSNs. 

3.  To  develop  a  software  framework  for  reliable  association  of  provenance  information  with  datasets. 

Accomplishments 

In  this  Section  we  detail  accomplishments  with  respect  to  each  of  the  objectives  mentioned  above. 

Objective  1 

Our  work  on  secure  RPC  in  WSNs  focused  on  tools  for  the  popular  TinyOS  programming  environment.  A 
major  contribution  was  an  extension  of  the  nesC  programming  language,  called  SpartanRPC,  that  provides 
a  secure  RPC  abstraction  and  an  expressive  and  fine-grained  security  policy  language.  SpartanRPC  allows 
multiple  networks  in  distinct  security  domains  to  interact  without  resorting  to  other  hardware  intermedi¬ 
aries. 

Our  system  includes  an  asynchronous  RPC  abstraction  based  on  nesC  wiring  constructs.  The  benefit  of 
this  approach  is  that  experienced  nesC  programmers  will  readily  understand  the  usage  of  novel  SpartanRPC 
constructs.  Furthermore,  programmers  can  specify  dynamic  endpoints  of  wirings,  to  accommodate  changes 
in  network  configurations. 

RPC  services  are  secured  with  authorization  policies  mediating  access.  These  policies  are  written  in 
an  expressive  language  that  supports  distributed,  decentralized  specification  and  maintenance.  Individual 
security  domains  are  able  to  manage  their  own  policy  specifications.  For  network  communications,  our 
underlying  protocols  use  public  keys  to  support  an  open-world  security  model,  where  security  domains 
do  not  need  to  share  secrets  a  priori.  Rather,  authorization  is  established  via  signed  credentials  commu¬ 
nicated  over  the  air.  Because  public  key  signature  verification  is  very  costly  in  WSNs,  our  protocols  also 
incorporate  symmetric  session  key  negotiations  for  efficient  communications  with  authorized  actors. 

Performance  overhead  of  this  system  was  measured  using  a  variety  of  metrics.  The  majority  of  compu¬ 
tational  costs  are  incurred  during  initial  authorization  periods  between  network  nodes,  when  credentials  are 
verified  and  session  keys  are  computed.  Following  this  short  transient  state  (90  seconds  to  several  minutes 
depending  on  network  densities),  normal  network  communications  proceed  with  relatively  little  overhead. 
General  impacts  are  summarized  in  Fig.  1.  To  demonstrate  memory  overhead,  we  compare  RAM  and  ROM 
bytes  consumed  in  a  simple  client-serve  test  harness,  implemented  in  “baseline”  fashion  with  no  security, 
and  implemented  using  the  SpartanRPC  framework  in  a  secure  fashion.  We  also  illustrate  impact  on  max¬ 
imum  messaging  rates,  by  comparing  such  rates  in  the  baseline  implementation  with  both  insecure  and 
secure  RPC  versions.  The  latter  data  shows  that  the  most  significant  messaging  overhead  incurred  by  our 
system  is  from  the  security  features,  not  the  RPC  abstraction  provided  in  SpartanRPC. 
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Figure  1 :  SpartanRPC  Memory  Overhead  (L)  and  Impact  on  Messaging  (R) 
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Figure  2:  Scalaness/nesT  Compilation  and  Execution  Model 


Objective  2 

Due  to  resource  constraints  in  WSNs,  application  efficiency  is  extremely  important.  At  the  same  time, 
complex,  distributed  algorithms  are  typical  application  components.  We  proposed  program  specialization, 
aka  staged  programming,  as  a  technique  to  address  both  these  issues.  In  a  staged  programming  language, 
code  is  treated  as  a  datatype,  and  programs  can  be  dynamically  generated  and  subsequently  executed.  We 
envisioned  that  a  staged  programming  language  could  be  executed  on  a  high-powered  WSN  network  “hub” 
device,  that  could  dynamically  generate  specialized,  efficient  WSN  node  programs  based  on  conditions.  A 
particular  novelty  of  our  approach  was  a  focus  on  type  safety,  to  ensure  that  type  safe  hub  programs  are 
guaranteed  to  generate  type  safe  node  programs.  We  call  this  cross-stage  type  safety  This  is  especially  im¬ 
portant  in  case  hubs  are  remote  and  not  accessible  by  operators  who  could  bug-fix  type  errors  in  generated 
code. 

Although  a  variety  of  staged  programming  languages  exist,  they  have  not  been  designed  with  cross¬ 
stage  type  safety,  or  in  an  execution  model  that  reflects  WSN  hardware  architecture.  Thus,  a  number  of 
foundational  issues  existed.  We  addressed  these  issues  in  the  (ML)  language,  which  combined  a  core-ML 
language  with  staging  features.  Our  main  result  was  a  formal  proof  of  cross-stage  type  safety  for  (ML), 
though  we  resolved  a  variety  of  practical  language  design  issues. 

On  this  foundation,  we  next  developed  a  practical  staged  language  for  developing  real  WSN  applica¬ 
tions.  This  language,  called  Scalaness/nesT,  extends  Scala  with  staging  features  for  executing  programs 
on  hubs,  that  generate  type-safe  nesC  programs  for  subsequent  deployment  to  WSN  nodes.  The  language 
compilation  and  execution  model  is  described  in  Fig.  2.  Of  particular  note  here  is  the  fact  that  cross-stage 
type  safety  of  Scalaness  source  code  ensures  that  compiled  bytecode  can  be  deployed  to,  and  run  on,  hubs 
without  fear  of  generating  ill-typed  specialized  WSN  node  programs. 

To  explore  and  demonstrate  practical  applications  of  Scalaness/nesT,  especially  in  a  security  setting, 
we  re-implemented  the  SpartanRPC  system  in  a  staged  style.  In  particular,  we  offloaded  credential  verifi¬ 
cation  and  symmetric  key  computations  to  a  Scalaness  program  on  a  hub,  that  generates  specialized  node 
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Figure  3:  Comparison  of  Staged  and  Unstaged  Version  of  SpartanRPC-secured  Application 


Figure  4:  Staging  Authorization  and  Authorized  Access  in  a  Multi-Domain  WSN. 


programs  with  embedded  session  keys  for  secured  network  communications.  This  system  is  illustrated  in 
Fig.  4.  A  staged  approach  to  SpartanRPC  communications  has  a  significant  effect  on  RAM  and  ROM  us¬ 
age,  as  summarized  in  Fig.  3.  We  compared  three  software  versions  of  a  client-server  application:  one  with 
no  security  mechanisms  in  place,  one  with  unstaged  SpartanRPC  protocols  in  place,  and  one  generated  by 
Scalaness  evaluation  in  our  staged  version  of  the  SpartanRPC  protocol.  The  “Savings”  in  this  figure  are  the 
percent  reduction  from  unstaged  to  staged  secure  implementation,  and  these  numbers  show  the  potential 
for  saving  both  RAM  and  ROM  space  is  quite  significant  using  staging.  This  is  especially  important,  since 
it  changes  the  delicate  balance  between  the  security  and  language  abstraction  benefits  of  SpartanRPC,  and 
associated  implementation  overhead. 

Objective  3 

Wireless  Sensor  Networks  typically  produce  time-series  data  that  is  frequently  intended  for  the  public 
domain.  But  even  if  data  is  shared  publicly,  it  is  still  important  to  maintain  metadata,  especially  provenance, 
for  proper  understanding  and  attribution  of  data  sources.  Most  metadata  schemes  for  environmental  data 
are  based  on  imposed  structure  and  annotations,  e.g.  XML  formats.  However,  this  scheme  is  brittle,  due 
to  typical  practices  of  domain  scientists,  who  are  mainly  interested  in  data,  who  perform  analysis  and 
spreadsheets,  and  who  are  likely  to  “throw  away  the  (metadata)  wrapper”. 

Thus,  we  proposed  a  scheme  for  embedding  identifiers  directly  in  time  series  data,  and  associating 
those  identifiers  with  web-accessible  provenance  information.  This  scheme  is  called  self-identifying  data. 
Self-identifying  data  leverages  noise  in  sensor  readings,  in  a  manner  similar  to  existing  watermarking 
techniques.  To  anticipate  manipulations  of  data  common  in  scientific  practice,  the  scheme  is  robust  to  data 
sampling,  reordering,  and  truncation.  Although  the  scheme  is  not  secure,  in  the  sense  that  it  can  be  easily 
subverted  by  a  malicious  actor,  it  is  rather  intended  to  support  “Fair-Use”  data  sharing  policies  between 
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Figure  5:  Anatomy  of  an  Annotated  Datapoint 


Figure  6:  Predicted  and  observed  probability  of  recovering  provenance  mark. 


well-intentioned  actors.  Support  for  Fair  Use  policies  is  especially  important,  since  concerns  about  proper 
attribution  is  one  of  the  most  common  barriers  to  data  sharing. 

Our  technique  is  illustrated  in  Fig.  5.  In  each  datapoint,  some  range  of  least  significant  bits  is  reserved 
for  metadata  embedding-  the  size  of  this  range  is  dependent  on  the  accuracy  and  precision  of  sensor  read¬ 
ings.  More  significant  bits  are  left  untouched.  Bits  reserved  for  metadata  are  used  to  encode  a  fragment 
of  the  embedded  provenance  identifiers,  while  the  parameter  and  mark  check  bits  are  used  to  determine 
fragment  ordering  and  validity  for  reconstructing  complete  marks.  Because  each  marked  datapoint  con¬ 
tains  only  a  fragment  of  the  complete  provenance  identifier,  identifier  reconstruction  requires  multiple 
datapoints. 

In  Fig.  6,  we  show  both  a  combinatorial  prediction  and  empirical  observations  of  the  probability  of 
recovering  a  provenance  identifier  given  a  particular  data  sample  size.  Empirical  observations  for  several 
datasets  are  shown.  In  summary,  our  results  illustrate  that  a  sample  of  30-40  distinct  datapoints  are  adequate 
for  near  100%  probability  of  provenance  identifier  recovery.  In  addition  to  this  analysis,  we  have  illustrated 
practical  application  of  our  scheme  in  a  publicly  available  dataset  generated  by  environmental  monitoring 
networks.  This  dataset  is  paired  with  a  web  tool  that  automatically  extracts  provenance  identifiers  from 
datasets,  and  redirects  to  webpages  with  provenance  information. 
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